Speech 2 Speech
InterSpeech2024
音声AIアプリ
音声処理
音声工学・音響工学
eleven labs
https://elevenlabs.io/docs/speech-synthesis/speech-to-speech
https://github.com/huggingface/speech-to-speech
hume ai
https://www.hume.ai/
recent speech language model
https://drive.google.com/file/d/1O5PKFl6fhLXyZVCdFmbmfDFiRGQeds_8/view
GLM-4-Voice
https://github.com/THUDM/GLM-4-Voice
J-Moshi を試す
https://note.com/schroneko/n/n6b7a95742ab2
Moshi: a speech-text foundation model for real-time dialogue
https://arxiv.org/abs/2410.00037
Soundwave: Less is More for Speech-Text Alignment in LLMs
https://arxiv.org/abs/2502.12900
Crossing the uncanny valley of conversational voice
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice
Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
https://arxiv.org/html/2505.15670v1